feat: pipeline accepts vep.tar.gz or vep/ dir by emmcauley · Pull Request #105 · fulcrumgenomics/twistcgp

emmcauley · 2026-05-07T15:39:32Z

Closes #96.

The reason we can't use the nf-core module UNTAR is because it applies --strip-components 1 when extracting, which strips the top-level directory from the archive. VEP requires the species/ subdirectory to be present under the cache root. We don't run VEP as part of the pipeline tests, I picked up on this with Claude.

github-actions · 2026-05-07T15:42:35Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 2711c2d

+| ✅  98 tests passed       |+
#| ❔  57 tests were ignored |#
!| ❗  14 tests had warnings |!

Details

❗ Test warnings:

files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
files_exist - File not found: ro-crate-metadata.json
readme - README did not have an nf-core template version badge.
readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in main.nf.test: Once you have added the required tests, please run the following command to build this file:
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.

❔ Tests ignored:

files_exist - File is ignored: .editorconfig
files_exist - File is ignored: .github/.dockstore.yml
files_exist - File is ignored: .github/CONTRIBUTING.md
files_exist - File is ignored: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
files_exist - File is ignored: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File is ignored: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File is ignored: .github/actions/get-shards/action.yml
files_exist - File is ignored: .github/actions/nf-test/action.yml
files_exist - File is ignored: .github/workflows/branch.yml
files_exist - File is ignored: .github/workflows/ci.yml
files_exist - File is ignored: .github/workflows/linting.yml
files_exist - File is ignored: .github/workflows/linting_comment.yml
files_exist - File is ignored: .github/workflows/nf-test.yml
files_exist - File is ignored: .prettierignore
files_exist - File is ignored: .prettierrc.yml
files_exist - File is ignored: CHANGELOG.md
files_exist - File is ignored: CITATIONS.md
files_exist - File is ignored: CODE_OF_CONDUCT.md
files_exist - File is ignored: LICENSE
files_exist - File is ignored: assets/email_template.html
files_exist - File is ignored: assets/email_template.txt
files_exist - File is ignored: assets/nf-core-twistcgp_logo_light.png
files_exist - File is ignored: assets/sendmail_template.txt
files_exist - File is ignored: conf/igenomes.config
files_exist - File is ignored: conf/igenomes_ignored.config
files_exist - File is ignored: conf/test_full.config
files_exist - File is ignored: docs/images/nf-core-twistcgp_logo_dark.png
files_exist - File is ignored: docs/images/nf-core-twistcgp_logo_light.png
files_exist - File is ignored: docs/output.md
files_exist - File is ignored: docs/README.md
files_exist - File is ignored: docs/usage.md
nextflow_config - nextflow_config
nf_test_content - nf_test_content
files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/.dockstore.yml
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/config.yml
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
files_unchanged - File does not exist: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File does not exist: assets/sendmail_template.txt
files_unchanged - File ignored due to lint config: assets/nf-core-twistcgp_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-twistcgp_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-twistcgp_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_nf_test - actions_nf_test
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/twistcgp/twistcgp/.github/workflows/awstest.yml
actions_awsfulltest - actions_awsfulltest
rocrate_readme_sync - rocrate_readme_sync

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: nf-test.config
files_exist - File found: tests/default.nf.test
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-twistcgp_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowTwistcgp.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
pipeline_if_empty_null - No ifEmpty(null) strings found
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: twistgp_ci.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - ALIGNBAM found in conf/modules.config and Nextflow scripts.
modules_config - BCFTOOLS_VIEW_PRE_CIVIC found in conf/modules.config and Nextflow scripts.
modules_config - BCFTOOLS_VIEW_POST_CIVIC found in conf/modules.config and Nextflow scripts.
modules_config - BWAMEM2_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - CIVICPY_UPDATE_CACHE found in conf/modules.config and Nextflow scripts.
modules_config - CIVICPY_ANNOTATE_VCF found in conf/modules.config and Nextflow scripts.
modules_config - CNVKIT_BATCH found in conf/modules.config and Nextflow scripts.
modules_config - UNTAR_VEP_CACHE found in conf/modules.config and Nextflow scripts.
modules_config - ENSEMBLVEP_DOWNLOAD found in conf/modules.config and Nextflow scripts.
modules_config - ENSEMBLVEP_VEP found in conf/modules.config and Nextflow scripts.
modules_config - GATK4_MUTECT2 found in conf/modules.config and Nextflow scripts.
modules_config - GATK4_FILTERMUTECTCALLS found in conf/modules.config and Nextflow scripts.
modules_config - FASTP found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - FGBIO_FASTQTOBAM found in conf/modules.config and Nextflow scripts.
modules_config - MSISENSORPRO_SCAN found in conf/modules.config and Nextflow scripts.
modules_config - MSISENSOR2_MSI found in conf/modules.config and Nextflow scripts.
modules_config - MSISENSORPRO_PRO found in conf/modules.config and Nextflow scripts.
modules_config - PERBASE found in conf/modules.config and Nextflow scripts.
modules_config - PICARD found in conf/modules.config and Nextflow scripts.
modules_config - PICARD_COLLECTHSMETRICS found in conf/modules.config and Nextflow scripts.
modules_config - PICARD_COLLECTMULTIPLEMETRICS found in conf/modules.config and Nextflow scripts.
modules_config - PICARD_INTERVALLISTTOBED found in conf/modules.config and Nextflow scripts.
modules_config - PICARD_MARKDUPLICATES found in conf/modules.config and Nextflow scripts.
modules_config - SAMTOOLS_FAIDX found in conf/modules.config and Nextflow scripts.
modules_config - SAMTOOLS_DICT found in conf/modules.config and Nextflow scripts.
modules_config - SNPEFF_DOWNLOAD found in conf/modules.config and Nextflow scripts.
modules_config - SNPEFF_SNPEFF found in conf/modules.config and Nextflow scripts.
modules_config - TMB found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - TWISTCGP found in conf/modules.config and Nextflow scripts.
modules_config - TABIX_POPULATION_GERMLINE found in conf/modules.config and Nextflow scripts.
modules_config - TABIX_PON found in conf/modules.config and Nextflow scripts.
modules_config - TABIX_COSMIC found in conf/modules.config and Nextflow scripts.
modules_config - TABIX_GNOMAD found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.3.2

Run details

nf-core/tools version 3.3.2
Run at 2026-06-08 19:50:08

emmcauley · 2026-05-07T17:42:36Z

@coderabbitai review

coderabbitai · 2026-05-07T17:42:58Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-07T17:53:29Z

📝 Walkthrough

Walkthrough

The PR adds support for accepting Ensembl VEP cache as either a pre-extracted directory or a .tar.gz archive. A new UNTAR_VEP_CACHE module extracts tarballs automatically before VEP runs. The main workflow refactors cache channel construction to branch conditionally: .tar.gz inputs are extracted via the module, while directory paths are used directly or fall back to the annotation database output. Documentation and parameter schema updated to describe both accepted input modes.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly summarizes the main change: pipeline now accepts VEP cache as either a tarball or directory.
Description check	✅ Passed	Description explains the feature, links to the related issue, and provides context for implementation choice.
Linked Issues check	✅ Passed	Changes fully implement issue `#96`'s requirement: pipeline now accepts VEP cache as .tar.gz files or directories with automatic extraction.
Out of Scope Changes check	✅ Passed	All changes directly support the VEP cache tarball/directory feature. Documentation, schema, module config, and new UNTAR_VEP_CACHE module are all on-scope.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch em_vep_cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

main.nf (1)
134-134: ⚡ Quick win

ensemblvep_cache take parameter is declared but never used.

The FULCRUMGENOMICS_TWISTCGP workflow accepts ensemblvep_cache as a take: input (line 134), but the new if/else block reads params.ensemblvep_cache directly and never references the channel parameter. This makes the take parameter dead code and couples the inner workflow to global params, reducing reusability.

Consider either: (a) using the ensemblvep_cache channel parameter in the if/else logic and removing the direct params access, or (b) removing the ensemblvep_cache take parameter entirely and updating the caller at line 97.

Also applies to: 170-181
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main.nf` at line 134, The workflow FULCRUMGENOMICS_TWISTCGP declares an input
channel ensemblvep_cache but the if/else logic reads params.ensemblvep_cache
directly, making the channel unused; update the conditional and downstream uses
to consume the ensemblvep_cache channel instead of params.ensemblvep_cache (e.g.
use ensemblvep_cache.first() or check ensemblvep_cache.empty? as appropriate) so
the workflow is driven by its input channel, and apply the same change to the
duplicate logic around the block that mirrors lines 170-181; alternatively, if
you prefer the global param, remove the ensemblvep_cache take: declaration and
update callers accordingly—pick one approach and make the code consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@main.nf`:
- Around line 171-175: The UNTAR_VEP_CACHE process is receiving a single-element
list because `.collect()` wraps the 2-element tuple emitted by
Channel.fromPath(...).map { [[id: 'vep_cache'], it] } into a list; remove the
`.collect()` so the channel supplies a destructurable 2-element tuple to
UNTAR_VEP_CACHE. In short: change the code that builds the input channel for
UNTAR_VEP_CACHE (the Channel.fromPath(params.ensemblvep_cache).map { [[id:
'vep_cache'], it] } expression) to omit `.collect()` so UNTAR_VEP_CACHE receives
tuple val(meta), path(archive) as expected.

In `@modules/nf-core/untar/main.nf`:
- Around line 53-62: The single-top-level-dir branch currently writes files
using the original archive paths so ${prefix} stays empty; change the loop
handling when the test on ${archive} is true to strip the first path component
from each archive entry and create files/dirs under ${prefix} (i.e., derive a
variable like stripped=$(echo "${i}" | sed -E 's#^[^/]+/##') and then use
${prefix}/$stripped for mkdir -p and touch), ensuring directories and files are
created inside ${prefix} rather than at the archive root; update the branch that
iterates over tar -tf ${archive} to use this stripped path logic for both files
and directories.

---

Nitpick comments:
In `@main.nf`:
- Line 134: The workflow FULCRUMGENOMICS_TWISTCGP declares an input channel
ensemblvep_cache but the if/else logic reads params.ensemblvep_cache directly,
making the channel unused; update the conditional and downstream uses to consume
the ensemblvep_cache channel instead of params.ensemblvep_cache (e.g. use
ensemblvep_cache.first() or check ensemblvep_cache.empty? as appropriate) so the
workflow is driven by its input channel, and apply the same change to the
duplicate logic around the block that mirrors lines 170-181; alternatively, if
you prefer the global param, remove the ensemblvep_cache take: declaration and
update callers accordingly—pick one approach and make the code consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 991e27e3-1833-42cf-a8da-2b8298feee49

📥 Commits

Reviewing files that changed from the base of the PR and between 65969ea and 3b5fc95.

⛔ Files ignored due to path filters (1)

modules/nf-core/untar/tests/main.nf.test.snap is excluded by !**/*.snap

📒 Files selected for processing (8)

docs/variant_annotation.md
main.nf
modules.json
modules/nf-core/untar/environment.yml
modules/nf-core/untar/main.nf
modules/nf-core/untar/meta.yml
modules/nf-core/untar/tests/main.nf.test
nextflow_schema.json

znorgaard · 2026-05-26T22:29:57Z

    // MODULE: PERBASE
    //
-    PERBASE(ALIGNBAM.out.bam_bai, ch_fasta.join(ch_fasta_fai).first())
+    PERBASE(ALIGNBAM.out.bam_bai, ch_fasta.join(ch_fasta_fai))


Why'd we drop the .first() here?

I believe .join returns a queue channel, so this will only process the first sample.

Is this doing what we want? It also seems unrelated to the PR.

znorgaard · 2026-06-04T17:03:44Z

+            channel.fromPath(params.ensemblvep_cache)
+                .map { it -> [[id: 'vep_cache'], it] }
+                .collect()
+        )


Coderabbit is correct on the bug, but its solution leaves this as a queue channel and so is wrong in a different way.

Suggested change

channel.fromPath(params.ensemblvep_cache)

.map { it -> [[id: 'vep_cache'], it] }

.collect()

)

channel.fromPath(params.ensemblvep_cache)

.collect { it -> [[id: 'vep_cache'], it] }

)

or

Suggested change

channel.fromPath(params.ensemblvep_cache)

.map { it -> [[id: 'vep_cache'], it] }

.collect()

)

channel.value(file(params.ensemblvep_cache))

.map { it -> [[id: 'vep_cache'], it] }

)

znorgaard · 2026-06-04T17:13:50Z

+                .map { it -> [[id: 'vep_cache'], it] }
+                .collect()
+        )
+        ch_vep_cache = UNTAR_VEP_CACHE.out.cache.collect()


Suggested change

ch_vep_cache = UNTAR_VEP_CACHE.out.cache.collect()

ch_vep_cache = UNTAR_VEP_CACHE.out.cache.first()

Same idea as above.

However, I think nextflow will actually implicitly preserve value channel status if all the inputs to a process are value channels. I don't see that documented anywhere though so we shouldn't rely on it.

process VALUE_IN_VALUE_OUT { input: tuple val(meta), path(some_file) output: tuple val(meta), path("output.txt"), emit: output script: """ touch output.txt """ } workflow { ch = channel.fromPath("somefile.txt") .collect { it -> [[id: "file"], it] } print(ch) ch.view() VALUE_IN_VALUE_OUT(ch) ch2 = VALUE_IN_VALUE_OUT.out.output print(ch2) ch2.view() }

DataflowVariable(value=null) DataflowVariable(value=null) executor > local (1) [fb/41b205] VALUE_IN_VALUE_OUT | 1 of 1 ✔ [['id':'file'], /.../somefile.txt] [[id:file], /.../work/fb/41b2054051393a8611a540b98dd9d5/output.txt]

znorgaard · 2026-06-04T17:14:47Z

+        ch_vep_cache = UNTAR_VEP_CACHE.out.cache.collect()
+    } else {
+        ch_vep_cache = params.ensemblvep_cache
+            ? channel.fromPath(params.ensemblvep_cache).map { it -> [[id: 'vep_cache'], it] }.collect()


Suggested change

? channel.fromPath(params.ensemblvep_cache).map { it -> [[id: 'vep_cache'], it] }.collect()

? channel.fromPath(params.ensemblvep_cache).collect { it -> [[id: 'vep_cache'], it] }

emmcauley · 2026-06-08T19:52:27Z

I ran this branch with both an uncompressed directory as well as a vep.tar.gz, here's how the cache path resolves:

tarbell

Cache: /private/tmp/nf_work_vep_tarball/37/37d02b0f04fb8eefeb155756dc16b6/cache/homo_sapiens/113_GRCh38; homo_sapiens_core_113_38 on ensembldb.ensembl.org

directory

Cache: /private/tmp/nf_work_vep_dir/c8/d98dc0180fa9230e7aada185897bb8/vep/homo_sapiens/113_GRCh38; homo_sapiens_core_113_38 on ensembldb.ensembl.org

znorgaard · 2026-06-08T20:59:59Z

@coderabbitai review

coderabbitai · 2026-06-08T21:00:05Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

main.nf (1)
173-174: ⚡ Quick win

Use consistent pattern for directory paths.

The tarball branch (line 168-169) uses channel.value(file(...)).map {...} while the directory branch uses channel.fromPath(...).collect {...}. Both produce value channels but the pattern inconsistency is confusing. Also, fromPath is designed for glob patterns; for directory paths, file() wrapped in channel.value() is more explicit.
♻️ Align with tarball branch pattern
     ch_vep_cache = params.ensemblvep_cache
-        ? channel.fromPath(params.ensemblvep_cache).collect { it -> [[id: 'vep_cache'], it] }
+        ? channel.value(file(params.ensemblvep_cache)).map { it -> [[id: 'vep_cache'], it] }
         : PREPARE_ANNOTATION_DB.out.ensemblvep_cache
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main.nf` around lines 173 - 174, ch_vep_cache uses channel.fromPath(...) for
the directory branch while the tarball branch uses
channel.value(file(...)).map(...); change the directory branch to the same
explicit pattern by wrapping params.ensemblvep_cache with file(...) and
channel.value(...) and using .map to produce the [[id:'vep_cache'], it] tuple so
both branches use the same value-channel approach (refer to ch_vep_cache and
params.ensemblvep_cache).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modules/local/untar_vep_cache/main.nf`:
- Around line 22-25: After extracting ${archive} into _tmp, validate that there
is exactly one top-level entry before running mv: list entries in _tmp (the
current code's top_level capture), count them, and if the count != 1 emit an
error (including the unexpected entries) and exit non-zero; only when count == 1
proceed to set top_level and mv "_tmp/${top_level}" cache. Reference the
existing variables/commands top_level, _tmp, ${archive}, tar and mv to locate
where to add this validation.

---

Nitpick comments:
In `@main.nf`:
- Around line 173-174: ch_vep_cache uses channel.fromPath(...) for the directory
branch while the tarball branch uses channel.value(file(...)).map(...); change
the directory branch to the same explicit pattern by wrapping
params.ensemblvep_cache with file(...) and channel.value(...) and using .map to
produce the [[id:'vep_cache'], it] tuple so both branches use the same
value-channel approach (refer to ch_vep_cache and params.ensemblvep_cache).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dcdcf3da-ff65-4283-9be8-ca4b82e0dab5

📥 Commits

Reviewing files that changed from the base of the PR and between 3b5fc95 and 2711c2d.

📒 Files selected for processing (8)

CHANGELOG.md
conf/modules.config
docs/variant_annotation.md
main.nf
modules/local/untar_vep_cache/environment.yml
modules/local/untar_vep_cache/main.nf
modules/local/untar_vep_cache/meta.yml
nextflow_schema.json

✅ Files skipped from review due to trivial changes (4)

CHANGELOG.md
modules/local/untar_vep_cache/environment.yml
modules/local/untar_vep_cache/meta.yml
docs/variant_annotation.md

🚧 Files skipped from review as they are similar to previous changes (1)

nextflow_schema.json

coderabbitai · 2026-06-08T21:07:04Z

+    mkdir -p _tmp
+    tar -xzf ${archive} -C _tmp
+    top_level=\$(ls _tmp | head -1)
+    mv "_tmp/\${top_level}" cache


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate exactly one top-level entry after extraction.

The script assumes the tarball contains exactly one top-level directory. If the archive has multiple entries, only the first is moved; others are silently lost. If zero entries or unexpected structure, mv fails.

🛡️ Add validation before moving

mkdir -p _tmp tar -xzf ${archive} -C _tmp +entry_count=\$(ls -A _tmp | wc -l) +if [ \$entry_count -ne 1 ]; then + echo "Error: Expected exactly one top-level entry in tarball, found \$entry_count" >&2 + exit 1 +fi top_level=\$(ls _tmp | head -1) mv "_tmp/\${top_level}" cache

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mkdir -p _tmp

tar -xzf ${archive} -C _tmp

top_level=\$(ls _tmp | head -1)

mv "_tmp/\${top_level}" cache

mkdir -p _tmp

tar -xzf ${archive} -C _tmp

entry_count=\$(ls -A _tmp | wc -l)

if [ \$entry_count -ne 1 ]; then

echo "Error: Expected exactly one top-level entry in tarball, found \$entry_count" >&2

exit 1

fi

top_level=\$(ls _tmp | head -1)

mv "_tmp/\${top_level}" cache

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modules/local/untar_vep_cache/main.nf` around lines 22 - 25, After extracting ${archive} into _tmp, validate that there is exactly one top-level entry before running mv: list entries in _tmp (the current code's top_level capture), count them, and if the count != 1 emit an error (including the unexpected entries) and exit non-zero; only when count == 1 proceed to set top_level and mv "_tmp/${top_level}" cache. Reference the existing variables/commands top_level, _tmp, ${archive}, tar and mv to locate where to add this validation.

emmcauley added 2 commits March 19, 2026 15:40

init

1b8f494

feat: allow skipping of downstream analyses (#85)

65969ea

emmcauley force-pushed the em_vep_cache branch from c0facc3 to 3b5fc95 Compare May 7, 2026 15:55

coderabbitai Bot requested changes May 7, 2026

View reviewed changes

Comment thread main.nf

Comment thread modules/nf-core/untar/main.nf Outdated

emmcauley mentioned this pull request May 7, 2026

feature branch: nice-to-have features #84

Draft

7 tasks

emmcauley force-pushed the em_vep_cache branch 2 times, most recently from 6b09a9b to e0939bf Compare May 14, 2026 19:19

feat: pipeline accepts vep.tar.gz or vep/ dir

cdc8ed0

emmcauley force-pushed the em_vep_cache branch from e0939bf to cdc8ed0 Compare May 20, 2026 18:35

emmcauley assigned znorgaard May 22, 2026

znorgaard requested changes Jun 4, 2026

View reviewed changes

znorgaard assigned emmcauley and unassigned znorgaard Jun 4, 2026

fix: reviewer comments

2711c2d

coderabbitai Bot requested changes Jun 8, 2026

View reviewed changes

emmcauley force-pushed the em_nice_to_haves branch from f369e2e to 406e13b Compare June 11, 2026 15:53

	ch_vep_cache = UNTAR_VEP_CACHE.out.cache.collect()
	ch_vep_cache = UNTAR_VEP_CACHE.out.cache.first()

	? channel.fromPath(params.ensemblvep_cache).map { it -> [[id: 'vep_cache'], it] }.collect()
	? channel.fromPath(params.ensemblvep_cache).collect { it -> [[id: 'vep_cache'], it] }

Conversation

emmcauley commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

Uh oh!

emmcauley commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

znorgaard May 26, 2026

Choose a reason for hiding this comment

Uh oh!

znorgaard Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

znorgaard Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

znorgaard Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

emmcauley commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tarbell

directory

Uh oh!

znorgaard commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

emmcauley commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

coderabbitai Bot commented May 7, 2026 •

edited

Loading

emmcauley commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading